How to Use Learn-To-Rank Features
NOTE: Learn-To-Rank works only with SmartAnalytics v5.0 and later versions.
About Learn-To-Rank
The Learn-To-Rank component is implemented on top of SmartAnalytics data to support four main features:
- Results Boosting
- User segmentation boosting
- Query suggestions
- Content suggestions
Learn-to-Rank uses clustering algorithms (K-means, but extendable to any other algorithm) to build clusters of similar (related) queries and their associated “important” documents.
- Learn-To-Rank groups queries and documents based on query similarity as well as user interaction with documents after a query is executed.
- These groups are called "clusters".
In a cluster, there are:
- Queries that are similar on the containing keywords
- Documents that those queries led to
- Other queries that are not necessarily similar in keywords, but led to the same documents as other queries
Learn-To-Rank Page
The Learn-To-Rank page can be found in the SmartHub Administration page (https://<hostname>:port/_admin), in the left panel.
The page is composed of two parts:
- Features Configuration
- Training Results (data)
Features Configuration
In the table below, you will find the Learn-To-Rank settings as they appear in the SmartHub interface. See the Training Results section below for more details and examples of the settings shown here.
- These settings must be fine tuned, over time, to fit properly with your environment and data
- Properly tuned settings reveal the most relevant documents within a cluster and search query
Setting | Default | Description |
---|---|---|
Learn to Rank Index Name | learntorank-cluster-storage |
Learn To Rank index name to be used for cluster storage Note: In case that the name of the index is changed, after an index is already created, the old index will not be removed from Elastic.
|
Number of days to go back for data | 30000 | Total number of days for query analytics data freshness. |
Number of actions threshold | 20 |
The minimum number of executed queries in order to be used for training. This number must fit your environment.
|
Max number of clusters | 10 |
Total number of clusters that you want your data to be split into. This number must fit your environment.
|
Number of documents to be boosted | 2 |
How many documents are going to be boosted (in case your search query can be assigned to a cluster).
|
Data cache sliding expiration in minutes | 30 minutes | Sliding expiration of data to be stored in cache. |
Search engine URL property used for boosting | clickUri | The property that your backend uses for path. |
User segmentation fields and weights | LTR Segmentation Field Name, User Profile Field Mapping, 1 |
This specifies the user segmentation boosting settings that are used to boost documents that are interacted with by users with similar classifications, such as role, location, job type, etc. The format for this setting is as follows: LTR Segmentation Field Name, User Profile Field Mapping (optional), Weight (optional - default is 1). You can specify multiple Learn-to-Rank segmentation fields by separating them with a semicolon(;). For example: Department,Department,10;JobTitle,SPS-JobTitle,5 |
Clear cache | N/A | Clear cache for the existing clusters |
Scheduled Task Name | BAInsight Learn to Rank Scheduler | The name of the scheduled task used to run the LTR trainer |
Scheduled Task Run Interval | 7 |
|
Enable | true | This should be set to true in case that you want to automatically train the data |
Learn-To-Rank Trainer
The trainer takes the data from SmartAnalytics and creates the clusters (shown in the screenshot in section "Training Results" below).
- The trainer is in the SmartHub package /Scheduled Jobs/LearnToRank/Task, in the file BAInsight.LearnToRank.Trainer.exe.config. See the code below.
- If you change the paths for the logger configuration file folder - /Caching - you have to change it in the trainer as well.
<appSettings>
<add key="LoggingFile" value="Logs.xml" />
<add key="LoggingOutputDir" value=".\Logs\" />
<add key="log4net.Config.Watch" value="True" />
<add key="ConfigFolder" value="../../../Configuration/" />
<add key="OAuthFolder" value="../../../OAuth/" />
<add key="CachingFolder" value="../../../Caching/" />
</appSettings>
- If the installation is successful, a new scheduled task is created in the Windows Task Scheduler.
- The name of the task is the one specified in the Scheduled Task Name field from the Learn-To-Rank page.
Training Results
Clusters might contain documents that are deleted in the search index.
-
This is because the Analytics index still contains them.
-
In time they disappear from this list as usage for other documents increases.
-
If you want to accelerate the process you can manually delete them from the Analytics index and retrain the data.
Sample Training results are shown below.
- Clusters - #1, #2, #3, #4
- Each cluster consists of:
- A series of queries, shown on the left
- Document URLs for each query listed on the right
- Number of actions taken on document on far right (download, opening, previewing, etc.)
- Queries, based on their keywords similarity, are grouped into clusters
- Each cluster consists of:
- Bold text - policy matterid=333056, diabetes treatment, biomedical research, albert gore
- Original cluster query
- The documents this query led to (and a specific threshold number of actions were taken to those documents) are added to the cluster
- The number of actions threshold is set in the table above in Number of actions threshold.
- This is set to a very low value of 4 due to the small data set in this sample.
- For example, in Cluster #1, the query text policy matterid=333056 led to the documents shown in Cluster #1 below, "Drug Recall Policy.pdf," "Anti_fraud_and_Fraud_policy.pdf," etc.
Note: Production environments will most likely have a document threshold in the thousands.
- Plain text queries
- Plain text, unbolded queries on left, under the bold text query, are queries that are pulled or "inherited" based on the document list on the right side
- These are the top queries which users have run to discover and take action on the documents listed on the right.
- In other words, a "backwards looking" query on the documents shown yields these queries, listed on the bold, original cluster query
Features
Learn-To-Rank results and user segmentation boosting
During a search, this stage checks if the query matches any of the clusters. If the query matches a cluster, it boosts the documents in that cluster according to their hits (number of clicks, previews, etc.). The documents that are selected to be boosted are the documents that the current query (or very similar query) lead to or documents that were interacted with by users with similar classifications, such as role, location, job type, etc.
-
The boost value is proportional with the number of actions of each document that the current query lead to and is within a the 1 - 100 range.
- To use Learn-to-Rank results and user segmentation boosting:
- Create a stage with empty parameters.
- The stage must be first stage in the list of Tuning stages in the section "Query Tuning"
Learn-To-Rank Query Suggestions
This feature provides suggestions as query text is entered in the search field.
-
The Learn-To-Rank Query Suggestions provider is located under TypeAhead.
To enable and use this TypeAhead provider:
-
(must be an SmartHub administrators) Click the UI Editor link from the SmartHub ADMINISTRATION page.
-
Click the Select a page link from the top menu.
-
Select (double-click) the page (Index.html, landing.html, etc.) you wish to modify.
-
Below, the Results.html page is shown for sample purposes.
-
-
Select the Customize type ahead link from the top of the page.
-
Type-ahead providers are listed under Settings on the left-side.
-
Select the LearnToRankSuggestions provider settings gear icon to produce the Type-ahead providers settings window.
-
Modify your settings as you desire. For details about each setting, see the table Type-Ahead Settings.
-
Click Apply.
-
Click Save changes.
Learn-To-Rank Content Suggestions
This feature provides the user with similar (search) results, excluding those present on the current page.
This is also used in the component Similar Documents.
-
For more about Content-By-Search, see How Users Can Personalize Their Search Results.
- Learn To Rank Results Suggestions tuning stage
- In order to use LTR Results Suggestion, create a stage with empty Parameters.
- The stage must be first in the list of stages under Query Tuning section seen in the SmartHub Administration UI.
- The Learn-To-Rank Similar Results module is located under <SmartHub installation>/modules/LearnToRank.
- In this module a Learn-To-Rank settings file contains the ID of your Content-By-Search (Learn-To-Rank element) and the URL property.
- This ID can be modified.
NOTE:
Be aware of the order of relevancy tuning stages!
Learn-To-Rank works only with SmartAnalytics v5.0 or above.